Symbolic Representation of Algorithmic Game Semantics 



Aleksandar S. Dimovski 

Faculty of Information-Communication Tech., FON University, Skopje, 1000, MKD 
aleksandar . dimovskiOf on . edu . mk 

In this paper we revisit the regular-language representation of game semantics of second-order re- 
cursion free Idealized Algol with infinite data types. By using symbolic values instead of concrete 
ones we generalize the standard notion of regular-language and automata representations to that of 
corresponding symbolic representations. In this way terms with infinite data types, such as integers, 
can be expressed as finite symbolic-automata although the standard automata interpretation is infi- 
nite. Moreover, significant reductions of the state space of game semantics models are obtained. This 
enables efficient verification of terms, which is illustrated with several examples. 

1 Introduction 

Game semantics ffllUQSl is a technique for compositional modelling of programming languages, which 
gives both sound and complete (fully abstract) models. Types are interpreted by games (or arenas) 
between a Player, which represents the term being modelled, and an Opponent, which represents the 
environment in which the term is used. The two participants strictly alternate to make moves, each of 
which is either a question (a demand for information) or an answer (a supply of information). Compu- 
tations (executions of terms) are interpreted as plays of a game, while terms are expressed as strategies, 
i.e. sets of plays, for a game. It has been shown that game semantics model can be given certain kinds of 
concrete automata-theoretic representations ||9j[T2l[T3l, and so it can serve as a basis for software model 
checking and program analysis. However, the main limitation of model checking in general is that it can 
be applied only if a finite-state model is available. This problem arises when we want to handle terms 
with infinite data types. 

Regular-language representation of game semantics of second-order recursion-free Idealized Algol 
with finite data types provides algorithms for automatic verification of a range of properties, such as 
observational-equivalence, approximation, and safety. It has the disadvantage that in the presence of 
infinite integer data types the obtained automata become infinite state, i.e. regular-languages have infinite 
summations, thus losing their algorithmic properties. Similarly, large finite data types are likely to make 
the automata infeasible. In this paper we redefine the (standard) regular-language representation |[T2l at 
a more abstract level so that terms with infinite data types can be represented as finite automata, and so 
various program properties can be checked over them. The idea is to transfer attention from the standard 
form of automata to what we call symbolic automata. The representation of values constitutes the main 
difference between these two formalisms. In symbolic automata, instead of assigning concrete values to 
identifiers occurring in terms, they are left as symbols. Operations involving such identifiers will also 
be left as symbols. Some of the symbols will be guarded by boolean expressions, which indicate under 
which conditions these symbols can be performed. 

The paper is organised as follows. The language we consider here is introduced in Section [2] Sym- 
bolic representation of algorithmic game semantics is defined in Section [3] Its correctness and suitability 
for verification of safety properties are shown in Section [4] In Section [5] we discuss some extensions of 
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the language, such as arrays, and how they can be represented in the symbolic model. A prototype tool, 
which implements this translation, as well as some examples are desribed in Section [6] 

Related work. By representing game semantic models as symbolic automata, we obtain a predicate 
abstraction lfl4l based method for verification. In it was also developed a predicate abstraction 
from game semantics. This was enabled by extending the models produced using game semantics such 
that the state (store) is recorded explicitly in the model by using so-called stateful plays. However, in 
our work we achieved predicate abstraction in a more natural way without changing the game semantic 
models, and also for terms with infinite data types. 

Symbolic techniques, in which data is not represented explicitly but symbolically, have found a 
number of applications. For example, symbolic execution and verification of programs (4J, symbolic 
program analysis [5], and symbolic operational semantics of process algebras fl31 . 

2 The Language 

Idealized Algol (IA) HI [2j is a well studied language which combines call-by-name A -calculus with the 
fundamental imperative features and locally-scoped variables. In this paper we work with its second- 
order recursion-free fragment (IA2 for short). 

The data types D are integers and booleans (D ::= int [ bool). The base types B are expressions, 
commands, and variables (B ::= expD | com | varD). We consider only first-order function types T 
(T ::= B | B -> T). 

Terms are formed by the following grammar: 

M ::=x I v | skip | MopM | M;M | ifMthenMelseM | whileMdoM 
I M:=M \\M I ne\N D x:=vmM | rnkvar^MM \ Xx.M \ MM 

where v ranges over constants of type D. Expression constants are infinite integers and booleans. The 
standard arithmetic-logic operations op are employed. We have the usual imperative constructs: sequen- 
tial composition, conditional, iteration, assignment, de-referencing, and "do nothing" command skip. 
Block-allocated local variables are introduced by a new construct, which initializes a variable and makes 
it local to a given block. The constructor mkvar is used for creating "bad" variables. We have the stan- 
dard functional constructs for function definition and application. Well-typed terms are given by typing 
judgements of the form T h M : T, where T is a type context consisting of a finite number of typed free 
identifiers, i.e. of the form x\ :T\,...,Xk'. Tk- Typing rules of the language are given in 1UI2]]. 

The operational semantics of our language is given for terms r h M : T, such that all identifiers in T 
are variables, i.e. r = x\ : varDi , . . . ,%k "■ varDj. It is defined by a big-step reduction relation: 

rhM,s^y,s' 

where s, s' represent the state before and after reduction. The state is a function assigning data values 
to the variables in T. We denote by V terms in canonical form defined by V ::= x \ v \ Xx.M \ skip | 
mkvaroMAf. Reduction rules are standard (see HI 121 for details). 

Given a term r h M : com, where all identifiers in T are variables, we say that M terminates in state 
s, written M,s JJ-, if Y h M,s =^ skip, s' for some state s'. If M is a closed term then we abbreviate the 
relation M, JJ, with M JJ.. We say that a term F h M : T is an approximate of a term T\- N :T, denoted by 
T Y-M^N, if and only if for all terms-with-hole C{-\ : com, such that h C[M] : com and h C[N] : com 
are well-typed closed terms of type com, if C[M] JJ. then C[N] JJ-. If two terms approximate each other 
they are considered observationally-equivalent, denoted by T h M = N. 
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3 Symbolic Game Semantics 

We start by introducing a number of syntactic categories necessary for construction of symbolic au- 
tomata. Let Sym be a countable set of symbolic names, ranged over by upper case letters X, Y, Z. For 
any finite W C Sym, the function new(W) returns a minimal symbolic name which does not occur in W, 
and sets W := W Unew(W). A minimal symbolic name not in W is the one which occurs earliest in a 
fixed enumeration X\,X2, . . . of all possible symbolic names. A set of expressions Exp, ranged over by e, 
is defined as follows: 

e :: = a \ b 

a::=n\ X M \ aopa 

b ::=tt \ff | X bo " 1 \a = a\a < a\^b\b Ab 

where a ranges over arithmetic expressions (AExp), and b over boolean expressions (BExp). We use 
superscripts to denote the data type of a symbolic name X. We will often omit to write them, when they 
are clear from the context. 

Let srf be an alphabet of letters. We define a symbolic alphabet s^ sym induced by srf as follows: 

^ sym = £ /u{?X,e | X G Sym, e G Exp} 

The letters of the form IX are called input symbols. They generate new symbolic names, i.e. IX means 

\etX = new(W) in We use a to range over si sym . Next we define a guarded alphabet srf gu induced 

by srf as the set of pairs of boolean conditions and symbolic letters, i.e. we have: 

#/ gu = {[b,a)\be BExp, a g ^ sym } 

A guarded letter [b, a) means that the symbolic letter a occurs only if the boolean b evaluates to true, i.e. 
if(b = tt)thenaelse%. We use /3 to range over srf gu . We will often write a for the guarded letter [tt,a). 
A word [b\,CCi) ■ [&2,«2) • • • [b n -,OL n ) over guarded alphabet srf gu can be represented as a pair [b,w), where 
b = b\ A &2 A . . . A b n is a boolean and w = OCi ■ a-i . . . a n is a word of symbolic letters. 

We now show how IA2 with infinite integers is interpreted by symbolic automata, which will be 
denoted by extended regular expressions. For simplicity the translation is defined for terms in j3 -normal 
form. If a term has j3-redexes, it is first reduced to /3 -normal form syntactically by substitution. In this 
setting, types (arenas) are represented as guarded alphabets of moves, plays of a game as words over a 
guarded alphabet, and strategies as symbolic automata (regular languages) over a guarded alphabet. The 
symbolic automata and regular languages, denoted by ^(R) and J£{R) respectively, are specified using 
extended regular expressions R. They are defined inductively over finite guarded alphabets s^ gu using 
the following operations: 

£ /3 RR' R* R + R' RDR' 
R\^ g u R[R'/w] R^ R'° 9m »R RtxiR' 

where R,R' ranges over extended regular expressions, snf gu , 3$ gu over finite guarded alphabets, j3 G £/ gu , 
a G £/ sym , £f' gu C £/ gu and w G £/ gu *. 

Constants 0, e and /3 denote the languages 0, {e} and {/?}, respectively. Concatenation R R 1 , Kleene 
star R*, union R + R' and intersection RDR' are the standard operations. Restriction R \^< S u replaces all 
symbolic letters from s/' gu with e in all words of R, but keeps all boolean conditions. Substitution 
R[R'/w] is the language of R where all occurrences of the subword w have been replaced by the words 
of R'. Given two symbols a G srf sym , /3 G £/ gu , is a new letter obtained by tagging the latter with 
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the former. If a letter is tagged more than once, we write (jf^ ^)^ = jSx ^" 1 ). We define the alphabet 
^gu(a) _ |^{a> | p g ^g»| Composition of regular expressions /?' defined over + ^r"(2> anc j 

over ^"<2) +^«< 3 ) is given as follows: 

R'° 9S g g u(2)R = {w[[b A fti A ft 2 A b\ A ft 2 A a\ = GCj A a 2 = a' 2 ,s)^ / 

[b u a^ 2 ) -[b 2 ,a 2 )V] | w€/U^,a{}< 2 MM ( M^> (2> e/?'} 

where /?' is a set of words of form [b\ , a[ ) < 2 > • [ft, s) W ■ [b' 2 , a' 2 )^ , such that [b\ , a[ ) {2) , [b' 2 , a' 2 )® € 
and [b, s) contains only letters from g/ gu ( l \ So all letters of 38 gu ( 2 i are removed from the composition, 
which is defined over the alphabet sz/ 8U ^ + C & 8U Q<. The shuffle operation of two regular languages is 
defined as Jf(R) txiJz? (R') = Uwiejs?(/?).w 2 e-S?(/?') w i 1X1 w 2, where wtxie = Efxiw = w and a-wi \x\b-w 2 = 
a ■ (wi ixi ft • W2) + ft • {a • w\ txi W2). It is a standard result that any extended regular expression obtained 
from the operations above denotes a regular language lfT2l pp. 11-12], which can be recognised by a 
finite (symbolic) automaton ifTTl . 

Each type T is interpreted by a guarded alphabet of moves induced by ^p-j ■ The alphabet 
contains two kinds of moves: questions and answers. They are defined as follows. 

■afont] ={---,-n,-n + l,...,n,n + l,...} =^|bool] = {«#} 
^[expDj = {<?} U J3fz>]] 4™] = {run, done} 
«e/[[ var D] = {read, write (a), a, ok \ a € ^[d]} 

Note that function types are tagged by a superscript ((/)) in order to keep record from which type, 
i.e. which component of the disjoint union, each move comes from. The letters in the alphabet ■s^rj 
represent moves (observable actions) that a term of type T can perform. For example, in ^ e xpD]] there is 
a question move q to ask for the value of the expression, and values from to answer the question. 
For commands, in i#j com ]] there is a question move run to initiate a command, and an answer move done 
to signal successful termination of a command. For variables, we have moves for writing to the variable, 
write(a), acknowledged by the move ok, and for reading from the variable, a question move read, and 
corresponding to it an answer from =2^td] - 

For any (/3-normal) term, we define a regular-language which represents its game semantics, i.e. its 
set of complete plays. Every complete play represents the observable effects of a completed computation 
of the given term. It is given as a guarded word [b,w), where the boolean ft is also called play condition. 
Assumptions about a play (computation) to be feasible are recorded in the play condition. For infeasible 
plays, the play condition is inconsistent (unsatisfiable), thus no assignment of concrete values to symbolic 
names exists that makes the play condition true. So it is desirable for any play to check the consistency 
(satisfiability) of its play condition. If the play condition is found to be inconsistent, this play is discarded 
from the final model of the corresponding term. The regular expression for T h M : T is denoted [[r h 
M : T]], and it is defined over the guarded alphabet <£^- T j defined as: 

v:/" I 

Free identifiers x : T € T are represented by the copy-cat regular expressions given in Table [TJ which 
contain all possible behaviours of terms of that type. They provide a generic closure of an open program 
term. For example, x : expD^ h x : expD is modelled by the word q ■ qW . X. Its meaning is that 
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[[r,x:Bf A) ^expD« hxiB^ -»• . . .flf } expD]] = q ■ ■ (E^k*^)*"^ -X 

[[r,x : Bf' 1} -> . . .fif '* } -> com« h jc : fl^ . . .fif -> com]] = 

ran • run$ ■ (Ekk^b* ) * ' done^ ■ done 
[T,jc : Sf J) -> . . . B { * •*> -> varD« h jc : flf* -> . . .5? > -> varD]] =" " 

(read -read® • {Z^KkR^T ^ z(x) -Z) + (writepZ') ■ write{Z')V> • (Ei<i<*4f } )* • oJk <JC > • oik) 
^ = 9 (^'). 9 (i).?Z».zM 

= • ra „<0 . do„ e ( ! ') • d one <*> ! > 

={nadtyd •nadP-TZ® •Z<^) + (wnYe(?Z') (AV ' ) ■write{Z ! )^ -ok^ -ok^) 



Table 1 : Free Identifiers 



[rhv: expD]] = g • v 

[[r h skip : com]] = run • cfo«<? 

[[r h c(Mj ,...,Mk): B'}\ = [[r h Mi : B^]] § ,„ (1> • • • 

••■ [[rhM,:Bf]]^ g „ w [[c:B! 1) x...BfUB']] 

[[r h M/V : r] = [[r h : fl^]] § (1> [T h M : r]] 

M 

[[rh newz)x:=vinM:B]] = ([[r,x:varD hMf n (tf M ^rvsp*)) Ltt 

[varD]] 

= HW-vW)* • (write(lZ)( x ) -ok® ■ (read {x) -Z«)*)* 

Table 2: Language terms 



Opponent starts the play by asking what is the value of this expression with the move q, and Player re- 
sponds by playing qW (i.e. what is the value of the non-local expression x). Then Opponent provides the 
value of x by using a new symbolic name X, which will be also the value of this expression. Languages 

(x i) 

R B ' contain plays representing a function which evaluates its i-th argument. 

Note that whenever an input symbol IX is met in a play, a new symbolic name is created, which 
binds all occurrences of X that follow in the play until a new IX is met. For example, [[f : expint^' 1 ^ — >• 
expint^ h/ : expint <1> -> expint]] =q-qf) ■ (qW-qW-lZ^ •Z^' 1 ))*-?Xv r > -X is a model for a non- 
local function/ which may evaluate its argument zero or more times. The play corresponding to/ which 
evaluates its argument two times is given as: q-q^ •q^'^> -q^ -zj 1 ^ -zf'^ -qV^) -q( l ) -Z^ -Z^'^ -X^> X. 
Note that letters tagged with / represent the actions of calling and returning from the function, while 
letters tagged with/. 1 are the actions caused by evaluating the first argument off. 

In Table |2] terms are interpreted by regular expressions describing their sets of complete plays. An 
integer or boolean constant is modeled by a play where the initial question q is answered by the value 
of that constant. The only play for skip responds to run with done. A composite term c{M\ ,Mk) 
consisting of a language construct 'c' and subterms Mi,... ,Mk is interpreted by composing the regular 
expressions for M\, . . . ,Mk, and a regular expression for 'c'. The representation of language constructs 
'c' is given in Table [3] In the definition for local variables, a 'cell' regular expression y£ is used to 
remember the initial and the most-recently written value into the variable x. Notice that all symbols used 
in Tables [TT2l3l are of data type D, except the symbol Z in if and while constructs, which is of data type 
bool. 

We define an effective alphabet of a regular expression to be the set of all letters appearing in the 
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[[op : expD[ l) x expD< 2> -> expD]] = q ■ qM-TZ® -qW-TZ!® • (ZopZ') 
[[; : corr/ 1 ' x com' 2 ' — )• com]] = run • run^ ■ done^ ■ run^ ■ done^ ■ done 
[[if : expbool (1) x com| 2) x comf ) ->■ com]] = [tt,run) ■ [tt,qM) ■ [ff,?zW)- 

([Z,rwn^) • [Undone 1 ' 2 "') + [-Z,?W 3 >) • [ft,rfone' 3 '}) • [tt,done) 
[[while : expbool^ x com' 2 ' -> com]] = [tt,run) ■ [tt,q^) ■ [tt,7Z^)- 

([Z,run®) ■ [tt,done®) ■ [tt,q< l >) ■ [tt,lZ^))* ■ [^Z,done) 
[[:= : varD^ x expD^ -> com]] = run ■ q^ -?Z< 2 > -write (Z)^> ■ ok w ■ done 
[[! ivarD^ -> expD]] = <? • reod^-IZ^ Z 



Table 3: Language constructs 



language denoted by that regular expression. Then we can show. 

Proposition 1. For any term Th M :T, the effective alphabet of |T h M : Tj is a finite subset ofgf^ T ^. 

Any term F h M : T from IA2 with infinite integers is interpreted by extended regular expression 
without infinite summations defined over finite alphabet. So the following is immediate. 

Theorem 1. For any IA 2 term, the set «£f p" h M : T]] jj a symbolic regular-language without infinite 
summations over finite alphabet. Moreover, a finite symbolic automata ,y^T \- M : T]] which recognizes 
it is effectively constructible. 

Proof. The proof is by induction on the structure of Y h M : T. 

An automaton is a tuple (Q,i, 8,F) where Q is the finite set of states, i £ Q is the initial state, 8 is the 
transition function, and F C Q is the set of final states. We now introduce two auxiliary operations. Let 
A 1 = (Q! ,8' \F') be an automaton, then A = rename (A' , tag) is defined as: 
Q = Q' i = i' F = F' 

8 = {qi [ ^lq2^8'\q^i\q 2 ^F'} + 

f . f [b,m^) ,.,[b,m) [b?n<f*>) [b,m) 

{1 — > q\i — >q£8}+{qi — > <?2 | <7i — > qi G 8 ,q 2 G F } 

Let Ai = 5i,Fi) and A 2 = (Q 2 ,i 2 ,8 2 ,F 2 ) be two automata, such that all transitions going out 

of i 2 and going to a state from F 2 are tagged with tag. Define A = compose(A\ ,A 2 ,tag) as follows: 

Q = Qi + Q2\{h,F 2 } i = h F = F\ 

8 = { qi [1 ^lq' l e8y\m^ n^)} + {q 2 1 ^ q> 2 £ 8j \m ^ n^} + 

[fc 1 Afc 2 AOTi=m 2 ,e) . . [ii^i ) , ~ . [fc,»4 ) , f 

|<7i — ► ^2 I ^1 — ^ ?iGoi,«2 — )■ g 2 G 8 2 , {mi,m 2 \ are questions} + 

[iiA*2Ami=in 2 ,e) , , [fci,"^) , <, [&2.mf s> > 
|<?2 — ► tfiltfi — ► ?i 6 Oi,?2 — ► ?2 6 8 2 , q 2 £ F2,{mi,m 2 )are answers} 

Let Am, A^, and Ao be automata representing F\- M,F\- N, and construct ; (see Table©, respectively. 
The unique automaton representing r h M ; is defined as: 



*-M;N 



compose(compose(Ao,rename(AM, 1), l),rename(A#,2),2) 



The other cases for constructs are similar. 

The automaton A = (Q,i,8,F) for [[r h newpx := vinM]] is constructed in two stages. First we 
eliminate x-tagged symbolic letters from Am = (2m,z'm, <>m>F m ), which represents |T,jc : varD hM], by 
replacing them with e. We introduce a new symbolic name X to keep track of what changes to x are 
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Figure 1 : The symbolic representation of the strategy for Mi . 



made by each x-tagged move. 

Qe = Qm h = iil F E = Fm 

[U=vAb,m) , . [b,m) _ 1 

o e = \im — > q \ im — t q e o M \ + 

\b,m) , [b,m 



{qi -^-l q2 | q\ -^-r qi £ 8m, m $.{write(a)( x \ok( x \read^ ,a^}} 

, [?Z=a'Afc 1 Afc 2 ,e) . _ , [fci,wnfe(a') W > c lh,ok {x) ) c m 

wi — > qi\^q\q\ — > q£o M ,q — > q2^o M )\ 

, [a'=XAb t Abi,e) [b u read^) . [b 2 ,a' {x) ) _ 

wi — ► ^2 1 ^q\q\ — > q£0 M ,q — > ?2 6%)} 
The final automaton is obtained by eliminating £ -letters from A e . Note that conditions associated to 

£ -letters are not removed. 
Q = Qe i = k F = F e 

o = ({o e \{?i — 1 <?i,<?2 e Gel] + wi — > q2\3q £Q e \q\ — > q ,q — >qi)} 

We write q\ —4 ^2 if q2 is reachable from gi by a series of £-transitions [bi,e),...,[bk,e), where 
b £ =b 1 A...b k . □ 

Example 1. Consider the term M\ : 

/ : corr/' 1 — > cord , abort : com ah " rt ,x : expint^y : expint v h/(if (jc 7^ y) then abort) : com 

in which/ is a non-local procedure, and x, y are non-local expressions. 

The strategy for this term represented as a finite symbolic automaton is shown in FigureQ] The model 
illustrates only the possible behaviors of this term: the non-local procedure / may call its argument, zero 
or more times, then the term terminates successfully with done. Iff calls its argument, arbitrary values 
for x and y are read from the environment by using symbols X and Y. If they are different (X ^ Y), 
then the abort command is executed. The standard regular-language representation [12] of M\, where 
concrete values are employed, is given in Figure 12 It represents an infinite-state automaton, and so it is 
not suitable for automatic verification (model checking). Note that, the values for non-local expressions 
x and y can be any possible integer. □ 



4 Formal Properties 

In |[T2l pp. 28-32], it was shown the correctness of the standard regular-language representation for 
finitary IA2 by showing that it is isomorphic to the game semantics model [1J. As a corollary, it was 
obtained that the standard regular-language representation is fully abstract. 

Let [T h M : T]] CR denotes the set of all complete plays in the strategy for a term Y h M : T from 
IA2 with infinite integers obtained as in lfl2ll . where concrete values in moves and infinite summations in 
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regular expressions are used. Suppose that there is a special free identifier abort of type com. A term 
is abort-free if it has no occurrence of abort. We say that a term is safe if for any abort-free term-with- 
hole C[— ], the term C[M] does not execute the abort command. Since the standard regular-language 
semantics is fully abstract, the following result is easy to show. 

Proposition 2. A term M is safe if [[T h M]] CR does not contain moves from =£^omJ- 

Let Eval be the set of evaluations, i.e. the set of total functions from W to ^p nt ] U ^jbooi]] ■ We use p to 
range over Eval. So we have p(X D ) £ ^\d\ f° r anv evaluation p and X D € W. Given a word of symbolic 
letters w, let p(w) be a word where every symbolic name is replaced by the corresponding concrete 
value as defined by p. Given a guarded word [b,w], define p([b,w)) = p(w) if p(b) = tt; otherwise 
p([b,w)) = if p(b) =ff. The concretization of a symbolic regular-language over a guarded alphabet is 
defined as follows: y&{R) = {p[b,w) | [b,w) e^{R),p eEval}. Let [[r h M : Tf R = ^[[F h M : T} 
be the strategy obtained as in Section 3, where symbols instead of concrete values are used. 
Theorem 2. For any IA 2 term 

y^Tr-Milf* = lThM:T]] CR 

Proof. By induction on the typing rules. The definitions of expression and command constructs are the 
same. 

Consider the case of free identifiers. 
y[[jc : expD^ h x : expDf R = y{q ■ q& -X D ^ -X D } 

= {q-q^-p(X D )^-p{X D ) | p : {X D } -> 
= {q ■ q^ ■ v^ ■ v | v G gf [m } = [[x : expD< A "> h x : expD]] CR 
The other cases are similar to prove. □ 

As a corollary we obtain the following result. 
Theorem 3. [[T h M : T]] SR is safe iff [T \~N : T]] CR is safe. 

By Proposition |2] and Theorem|3]it follows that a term is safe if its symbolic regular-language seman- 
tics is safe. Since symbolic automata are finite state, it follows that we can use model-checking to verify 
safety of IA2 terms with infinite data types. 

In order to verify safety of a term we need to check whether the symbolic automaton representing 
a term contains unsafe plays. We use an external SMT solver Yices Q lfTTTl to determine consistency of 
the play conditions of the discovered unsafe plays. If some play condition is consistent, i.e. there exists 
an evaluation p that makes the play condition true, the corresponding unsafe play is feasible and it is 
reported as a genuine counter-example. 

1 http://yices.csl.sri.com 
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[X>ZAX<0,done> 



Figure 3: The strategy forM2. 

Example 2. The term M\ from Example 1 is abort-unsafe, with the following counter-example: 

run rurJ rur/* 1 <f X x q y Y y [X / Y,run ahort ) done abort don/' 1 done 1 f done 

The consistency of the play condition is established by instructing Yices to check the formula: 

(define X :: int) 
(define Y : : int) 
(assert (/ = X Y)) 

The following satisfiable assignments to symbols are reported: X = 1 and Y = 2, yielding a concrete 
unsafe play: run runf run?' 1 q x V q y 2 y run abort done abort done?' 1 done? done. □ 

Example 3. Consider the term M 2 : 

N : exp\nt N , abort : com abort h newj nt x := in 

while (x <N) Aox :=x + 1; 
if (x > 0) then abort : com 

The strategy for this term (suitably adapted for readability) is given in Figure [3] Observe that the 
term communicates with its environment using non-local identifiers N and abort. So in the model will 
only be represented actions of ,/V and abort. Notice that each time the term (Player) asks for a value of 
Af with the move q , the environment (Opponent) provides a new fresh value ?Z for it. The symbol X is 
used to keep track of the current value of x. Whenever a new value for Af is provided, the term has three 
possible options depending on the current values of Z and X: it can terminate successfully with done; it 
can execute abort and terminate; or it can run the assignment x'.= x + 1 and ask for a new value of N. 

The shortest unsafe play found in the model is: 

[X = 0, run) q N Z N [X > Z A X > 0, run abort ) done abort done 

But the play condition for it, X = A X > Z A X > 0, is inconsistent. The next unsafe play is: 

[X { = 0, run) q N Z X N [Xj < Zj A X 2 = X x + 1 , <f) Z 2 N [X 2 > Z 2 A X 2 > 0, run abort ) done abon done 

Now Yices reports that the condition for this play is satisfiable, yielding a possible assignment of concrete 
values to symbols that makes the condition true: X\ = 0, Zi = 1, X 2 = 1, Z 2 = 0. So it is a genuine 
counter-example, such that one corresponding concrete unsafe play is: run ■ q N -l N -q N -0 N ■ run ■ 
done abort ■ done. This play corresponds to a computation which runs the body of while exactly once. 
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Let us modify the M 2 term as follows 

newjnt* := in while (x < N) do x := x + 1; if (x > k) then abort 

where /: > is any positive integer. The model for this modified term is the same as shown in Figure |3l 
except that conditions associated with letters run abort (resp., done) are X > Z AX > k (resp., X > Z A 
X < k). In this case the (k+ l)-shortest unsafe plays in the model are found to be inconsistent. The 
first consistent unsafe play corresponds to executing the body of while (k+ l)-times, and one possible 
concrete representation of it (as generated by Yices) is: 

run ■ q N ■ \ N ■ q" ■ 2 N ■...■q N - (k + l) N ■ q N ■ (f ■ run abort ■ done abort ■ done 

□ 

5 Extensions 

We now extend the language with arrays of length k > 0. They can be handled in two ways. Firstly, we 
can introduce arrays as syntactic sugar by using existing term formers. An array x[k] is represented as a 
set of k distinct variables x[0], x[l], . . ., x[k — 1], such that 

x[E] = 

if£" = 0thenx[0]else 

if E = k— 1 then;e[& — 1] else skip (abort) 

If we want to verify whether array out-of-bounds errors are present in the term, i.e. there is an attempt 
to access elements out of the bounds of an array, we execute abort instead of skip when E > k. This 
approach for handling arrays is taken by the standard representation of game semantics Ifl2l l9ll. 

Secondly, since we work with symbols we can have more efficient representation of arrays with 
unconstrained length. While in the first approach the length of an array k must be a concrete positive 
integer, in the second approach k can be represented by a symbol. We use the support that Yices provides 
for arrays by enabling: function definitions, function updates, and lambda expressions. For each array 
x[k] : varD, we can define a function symbol X (X : int — > D) in Yices as: 

(define X :: (— > intD)) 

The function symbol X can be initialized and updated as follows: 

(lambda (index :: int)val) 
(updateX (index) val) 

A non-local array element is expressed as follows. 

|T,x[Jfc] h x[E] : varD]] = [[r h E : expint^]] § (1) jr,jt[Jfc] h : varD]] 

[[expint]| 

flr,jc[Jfc] hx[-] : varD]] = read-q^-lZ^ ■ [Z < k,read^ z]) )-lZ'^ -Z'+ 

write(lZ') ■ qW -1Z^ ■ [Z < k, write(Z')^) ■ ok^ -ok 
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If we want to check for array out-of-bounds errors, we extend this interpretation by including plays that 
perform moves associated with a bort command when Z >k. For example, the de-referencing (reading) 
part of the interpretation will be given as follows: 

read-q^-lZ^ • ([Z < k,read m) )-lZ'^ Z' + [Z > k,run^ abo ^) ■ done( abort) -0) 

The automaton A for [[r h ne\Nox[k] := vinM]], where Am represents [[r,jc[fc] h M\, is obtained as 
follows. We first construct A e by eliminating x-tagged moves from Am- 

Qe = Qm h = lM F e = F M 
x r . [X(j)~vAb,m) . [b,m) s , 
O e = {iM — > q\iM — >q£0 M \ + 

{<7i ^—lq2 | q\ ^—lq2 € 8m, m ^{write(a)^ ,ok^ ,read^ ,a^}} 

, [X(a'):=oA6iA6 2 ,£) . _, . [b u write{a)^) _ [iz,')*:^' 1 )) c ,, 

{^1 — > q 2 \lq.{qi — > q G M ,q — > qi € M J j 

|9l — 5> ^ 2 | =1^.(^1 — >• q£d M ,q — > q2&o M )} 

We use X(j) := v to mean that the function symbol X is initialized to v for all its arguments, while 
X(a') := a means that X at argument a' is updated to a. The final automaton A is generated by removing 
£-letters from A e , similarly as it was done for the case of new/) in TheoremQ] 



6 Implementation 

We have developed a prototype tool in Java, called SYMBOLIC GameChecker, which automatically 
converts an IA2 term with integers into a symbolic automaton which represents its game semantics. The 
model is then used to verify safety of the term. Further examples as well as detailed reports of how they 
execute on Symbolic GameChecker are available from: 
http : //www . dcs . Warwick . ac . uk/~aleks/ symbolicgc . htm. 

Along with the tool we have also implemented in Java our own library of classes for working with 
symbolic automata. We could not just reuse some of the existing libraries for finite-state automata, due to 
the specific nature of symbolic automata we use. The symbolic automata generated by the tool is checked 
for safety. We use the breadth-first search algorithm to find the shortest unsafe play in the model. Then 
the Yices is called to check consistency of its condition. If the condition is found to be consistent, the 
unsafe play is reported; otherwise we search for another unsafe play. If no unsafe play is discovered or 
all unsafe plays are found to be inconsistent, then the term is deemed safe. The tool also uses a simple 
forward reachability algorithm to remove all unreachable states of a symbolic automaton. 

Let us consider the following implementation of the linear search algorithm. 

x[k] : varint*H, y : expint- v , abort : com abort h 
new,„ f /:=0in 
nevj int p:=y\n 
while (i < k)do{ 

\f(x[i] =p) then abort; 
i:=i + l; } 
: com 

The program first remembers the input expression y into a local variable p. The non-local array x is 
then searched for an occurrence of the value stored in p. If the search succeeds, then abort is executed. 
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[?P=YAl>k,done> 



Figure 4: The symbolic model for linear search. 
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Table 4: Verification of the linear search with finite data 



The symbolic model for this term is shown in Fig. |4j where for simplicity array out-of-bounds errors 
are not taken in the consideration. If the value read from the environment for y has occurred in x, then an 
unsafe behaviour of the term exists. So this term is unsafe, and the following counter-example is found: 

[/! = A k > 0, run) Y y [P = Y A h < k, rectd x[Il] ) Z X W 
[Z = P, run abort ) done ahort [/ 2 =/i + lA/ 2 > k, done) 

This play corresponds to a term with an array x of size k= I, where the values read from x[0] and y are 
equal. 

Overall, the symbolic model for linear search term has 9 states and the total time needed to generate 
the model and test its safety is less than 1 sec. We can compare this approach with the tool in [9], where 
the standard representation based on CSP process algebra of terms with finite data types is used. We 
performed experiments for the linear search term with different sizes of k and all integer types replaced 
by finite data types. The types of x, y, and p is int n , i.e. they contain n distinct values {0, . . .n — 1}, and 
the type of the index i is intk+i, i.e. one more than the size of the array. Such term was converted into 
a CSP process [9], and then the FDR model checker was used to generate its model and test its safety. 
Experimental results are shown in Table 01 where we list the execution time in seconds, and the size of 
the final model in number of states. The model and the time increase very fast as we increase the sizes 
of k and n. We ran FDR and SYMBOLIC GameChecker on a Machine AMD Phenom II X4 940 with 
4GB RAM. 
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7 Conclusion 

We have shown how to reduce the verification of safety of game-semantics infinite-state models of IA2 
terms to the checking of the more abstract finite symbolic automata. 

Counter-example guided abstraction refinement procedures (ARP) [7, 8] can also be used for verifi- 
cation of terms with infinite integers. However, they find solutions after performing a few iterations in 
order to adjust integer identifiers to suitable abstractions. In each iteration, one abstract term is checked. 
If an abstract term needs larger abstractions, then it is likely to obtain a model with very large state space, 
which is difficult (infeasible) to generate and check automatically. The symbolic approach presented in 
this paper provides solutions in only one iteration, by checking symbolic models which are significantly 
smaller than the abstract models in ARP. The possibility to handle arrays with unconstrained length is an- 
other important benefit of this approach. Extensions to nondeterministic [10] and concurrent fll3j terms 
can be interesting to consider. 
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